Common differentially expressed genes and pathways correlating both coronary artery disease and atrial fibrillation

Coronary artery disease (CAD) and atrial fibrillation (AF) share common risk factors, such as hypertension and diabetes. The patients with CAD often suffer concomitantly AF, but how two diseases interact with each other at cellular and molecular levels remain largely unknown. The present study aims to dissect the common differentially expressed genes (DEGs) that are concurrently associated with CAD and AF. Two datasets [GSE71226 for CAD) and GSE31821 for AF] were analyzed with GEO2R and Venn Diagram to identify the DEGs. Signaling pathways, gene enrichments, and protein-protein interactions (PPI) of the identified common DEGs were further analyzed with Kyoto Encyclopedia of Gene and Genome (KEGG), Database for Annotation, Visualization and Integrated Discovery (DAVID), and Search Toll for the Retrieval of Interacting Genes (STRING). 565 up- and 1367 down-regulated genes in GSE71226 and 293 up- and 68 down-regulated genes in GSE31821 were identified. Among those, 21 common DEGs were discovered from both datasets, which lead to the findings of 4 CAD and 21 AF pathways, 3 significant gene enrichments (intracellular cytoplasm, protein binding, and vascular labyrinthine layer), and 3 key proteins (membrane metallo-endopeptidase (MME), transferrin receptor 1 (TfR1), and Lysosome-associated membrane glycoprotein 1 (LAMP1)). Together, these data implied that these three proteins may play a central role in development of both CAD and AF.


INTRODUCTION
Cardiovascular disease is the leading cause of death in the developed countries (Virani et al., 2020). Of all cardiovascular diseases (e.g., acute myocardial infarction, heart failure, valvular heart disease, cerebrovascular accident, transient ischemic attack, peripheral arterial disease, sudden cardiac arrest, ventricular arrhythmia, venous thromboembolism, and pulmonary embolism), coronary artery disease (CAD) is the most common type and contributes the highest rate of death (Michniewicz et al., 2018;Virani et al., 2020); whereas of all cardiac arrhythmia (e.g., supraventricular tachycardia, ventricular tachycardia, sinus-node dysfunction, and heart block), atrial fibrillation (AF) is the most typical disorder, and it affects about 37.6 million individuals globally in 2017 (Go et al., 2001;Michniewicz et al., 2018;Virani et al., 2020).
Interestingly, studies found that AF is highly associated with the increased risk of many other diseases, such as CAD, stroke, heart failure, diabetes, sudden cardiac death, and mortality, especially within the aging populations (Motloch et al., 2017;Murakami et al., 2017;Virani et al., 2020). In the case of CAD, it was demonstrated that both AF and CAD share the same risk factors and impact on each other (Kristensen et al., 2020;Lieder et al., 2018;Motloch et al., 2017). A systematic review and meta-analysis of 15 cohort studies, for example, demonstrated that AF was associated with a 1.54-fold increased risk of myocardial infarction induced by CAD (Ruddox et al., 2017). Overall, about 17-46.5 % patients with AF suffer concomitantly CAD while the patients with CAD have a low prevalence rate (0.2 % to 5 %) of AF, suggesting the significant effects of AF on promoting morbidity and mortality of concomitant diseases (Michniewicz et al., 2018).
Meanwhile, the outcomes of the patients with CAD is modulated by AF; however, it's still unclear whether the presence of CAD simply increases the risk of AF or changes the impact of other risk factors (Mehta et al., 2003;Pilgrim et al., 2013). The management of AF with concomitant CAD is still a huge clinical challenge (Gladding et al., 2020). Fully understanding the similarities in the pathogenesis of AF and CAD may reveal the mechanisms underlying both diseases and facilitate discovery of new therapy targets.
Bioinformatic analysis of gene profiles offers a novel approach to explore the underlying mechanisms of disease at the molecular level. This technique has been widely utilized in basic and clinical studies (Kumar et al., 2016), yet only limited data is reported regarding interlinkages of critical genes and signaling molecules between CAD and AF (Kertai et al., 2015). In this paper, we aimed to profile the common differentially expressed genes (DEGs) of CAD and AF by using the sequencing databases of these two diseases and identified potential pathways modulating the development of CAD and AF.

Data sources
The datasets of gene of interest with sequence number GSE71226 and GSE31821 were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). In GSE71226 microarray dataset, 3 samples from the patients with CAD and 3 samples of healthy subjects were included; while in GSE31821 dataset, 4 samples from the patients with AF and 2 samples of healthy subjects were enrolled. Both datasets were collected from GPL570 Platforms ((HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array). The detail information is shown in Table 1. Peripheral blood Cardiac (atrial) tissue Number of subjects 6 (3 patients + 3 controls) 6 (4 patients + 2 controls)

Identifications of differentially expressed genes
DEGs between patients and healthy subjects were identified via GEO2R online tools (log2FC > 1 or log2FC < -1, p value <0.05) (Davis and Meltzer, 2007). The row data were then run in Venn Diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/) to identify the common DEGs between 2 datasets. The DEGs with log2FC < -1 were considered as down-regulated genes, while the DEGs with log2FC > 1 were regarded as up-regulated genes. Heatmap of gene expression was made by R package ggplot2 as described previously (Aibar et al., 2015;Walter et al., 2015).

Gene ontology enrichment analysis
The Gene ontology (GO) analysis has become a common way to analyze large scale genomic data (Zheng et al., 2008). Kyoto encyclopedia of genes and genomes (KEGG) (https://www.genome.jp/kegg/) is a biological genomic database that focuses on computerization of molecular linkage among genomes, gene functions, and biochemical (metabolic and regulatory) pathways of all organisms under normal and disease conditions (Ogata et al., 1999). Database for Annotation, Visualization and Integration Discovery (DA-VID) (https://david.ncifcrf.gov/) and Cystoscape software (https://cytoscape.org/) were used for the GO enrichment and KEGG pathway analysis of integrated differential genes. DAVID (v6.8) is an online bioinformatic tool that is designed to identify gene and protein functions and visualize different signaling pathways. In these analyses, bar plots were made by R package heatmap to show the ten most significant enriched GO terms (Aibar et al., 2015;Walter et al., 2015)

Protein-protein interaction network mapping
The online Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db.org/) was used to analyze the protein-protein interaction (PPI) network of the DEGs as described in the previously published study (Szklarczyk et al., 2015). Since PPI is known to modulate a variety of biological process, such as cellular metabolisms, development processes, and cell-to-cell interactions, thus, it could be used to predict key protein(s) that regulate cellular specific functions or to be screened as potential therapeutic target(s) .

Up-and down-regulated genes that were concurrently expressed in the patients with CAD and AF
To outline the profiles of DEGs, two datasets were analyzed with GEO2R software. From GSE71226 dataset of the patients with CAD patients and healthy subjects, a total of 1932 DEGs was identified, among which 565 genes were up-regulated (p < 0.05, log2FC > 1) and 1367 genes were down-regulated (p < 0.05, log2FC < -1) (Supplementary Table 2 and Figure 1). Similarly, from GSE31821 dataset of the patients with AF and healthy subjects, a total of 361 genes were extracted, among which 293 genes were up-regulated (p < 0.05, log2FC > 1) and 68 genes were downregulated (p < 0.05, log2FC < -1) (Supplementary Table 3 and Figure 1), suggesting significantly differential expressions of multitudinous genes in the patients with CAD and AF.
To further determine the common DEGs that exist in both datasets, we ran two datasets on Venn Diagram and confirmed 21 common DEGs, which comprises 14 up-regulated genes (p < 0.05, log2FC > 1) and 7 down-regulated genes (p < 0.05, log2FC < -1) ( Table 2 and Figure 1). To unveil the expression patterns of the DEGs among all groups, the top 100 DEGs were selected based on the p-values (p < 0.05), and constructed as a cluster heatmap to show the cross-correlation of those genes among each individual.  . Panel B shows 7 down-regulated genes from GSE71226 dataset (in blue) and GSE31821 (in red). DEGs: differentially expressed genes. GSExxxx: gene set enrichment #; Log2FC > 1 or < 1: fold changes in logarithms to base 2 between patients and healthy subjects are greater (up-regulated) or lower (downregulated) than 1.
As shown in Figure 2, there are significant differences of gene expression profiles between healthy subjects and patients. Overall, 5 healthy subjects from two datasets exhibited similar patterns of gene expressions except for group 1 (G1), 3, 5, and 9; while significant differences of expression levels were observed in all groups (G1-9) between 3 CAD patients and 4 AF patients with high expressions in G1, 2, 7 and low expressions in G3, 4, 5, 6, 8, and 9 in the patients with CAD compared to the same groups of the patients with AF. Interestingly, STX17-AS1, BAG1, GYPC, STRADB, S100A9, and HBM are the mostly expressed genes in the CAD patients while ACTA1, FHL2, FABP4, and EGR1 are the mostly expressed genes in the AF patients. The common DEGs between CAD and AF appear in G1 and 2, such as STEAP4, SLC6A8, GYPC, and STRADB. Together, these data demonstrated that majority of the DEGs expressed differently between CAD and AF, but a small group of genes expressed concurrently.

Variable and common GO terms between CAD and AF
To characterize the three critical terms, biological process (BP), molecular function (MF), and cellular component (CC) of the DEGs identified above, the GO (i.e., overrepresentation or term enrichment) enrichment analysis was performed on two datasets and the results (i.e., terms) are presented as graph (ontology) structure shown in Figure 3.
From GSE71226 dataset of the patients with CAD and healthy subjects, it was found that for the BP term, the DEGs were mostly enriched in the regulation of transcription (GO: 0006355, p = 1.91E-13); while for MF term, they were mostly enriched in the DNA binding (GO:0003676, p = 5.75E-12); and lastly for CC term, they were mostly enriched in the nucleoplasm (GO: 0005654, p = 1.24E-21) (Table 3 and Figure 3A). Similarly, from GSE31821 dataset of the patients with AF and healthy subjects, it was found that for the BP term, the DEGs were mostly enriched in the extracellular matrix (GO:0030198, p = 2.07E-06); while for MF term, they were mostly enriched in the cadherin binding in cell-cell adhesion (GO:0098641, p = 1.62E-06), and lastly for CC term, they were mostly enriched in the extracellular exosome (GO: 0070062, p = 4.89E-11) ( Table 3 and Figure 3B). Together, these data demonstrated that the DEGs identified from the patients with CAD or AF were expressed (enriched) differentially in the aspects of the BP, MF, and CC.
However, when the 21 common DEGs were analyzed, the resulting GO terms are different from the above. Specifically, for the BP, the common DEGs were particularly enriched in the regulation of labyrinthine layer in embryonic blood vessel; while for the MF, they were remarkably enriched in the protein binding, and lastly for the CC, they were substantially enriched in the intracellular cytoplasm (Table 4), implying that those terms (BP, MF, and CC) may represent the common pathogenesis in the development of CAD and AF.

Numerous but not common pathways were detected in both CAD and AF
Next, we used DAVID software to map the KEGG pathways of the identified DEGs from both datasets. Briefly, from GSE71226 dataset of the patients with CAD and healthy subjects, four key pathways were determined: 1) mRNA surveillance pathway; 2) eukaryotic ribosome biogenesis pathway; 3) glucagon signaling pathway; and 4) other types of O-glycan biosynthesis (Table 5). However, from GSE31821 dataset of the patients with AF and healthy subjects, twenty-one key pathways were discovered, including 1) Focal adhesion; 2) MAPK; 3) Amoebiasis; 4) Cancer; 5) PI3K-Akt; 6) Wnt signaling; 7) ECMreceptor interaction; 8) Platelet activation; 9) Toxoplasmosis; and 10) Proteoglycans in cancer (see Table 5 for the rest 11 pathways). Interestingly, only the hematopoietic cell lineage signaling pathway was enriched by the common DEGs, and the statistical test is close to significance (p=0.085), implying that the hematopoietic cell lineage signaling pathway     "may be" an interactive linkage between CAD and AF that involves membrane metallo-endopeptidase (MME, also known as Neprilysin or Neutral endopeptidase 24.11) (Sankhe et al., 2020), and TfR1.

Protein-protein interaction network and molecular analysis
PPIs, either via strong or weak physical or functional interactions, play fundamental roles in cellular functions and biological processes of all organisms under normal condition and disease development . In this respect, the present study used STRING to mine all proteins coded by the DEGs for potential interactions within and between the datasets from the patients with CAD and/or AF. Specifically, from GSE71226 dataset of the patients with CAD and healthy subjects, an intricate PPI network was recognized by STRING analysis. Since the network is so complex as shown in Figure 4A, it is unlikely to decode network(s) of interest; thus, we screened 2 functional modules with the help of Cystoscape software in the network. The results showed that Module A ( Figure 4B, the left side) and Module B ( Figure 4B, the right side) contain 13 and 12 nodes, respectively. Among them, U2 snRNP-associated SURP motif-containing protein (U2SURP, a RNA binding protein (De Maio et al., 2018)), Luc7like protein 3 (LUC7L3, a DNA/RNA binding protein (Tufarelli et al., 2001)), and Pinin (PNN, a DNA/RNA binding protein (Hsu et al., 2020)) are the most important nodes in Module A; while in Module B, it was found that Glycophorin-C (GYPC, an erythrocyte regulatory protein (Jaskiewicz et al., 2018)), Protein 4.1 (EBP41 or Beatty's protein, an erythrocyte structural and regulatory protein (Kiyomitsu and Cheeseman, 2013)), and Alpha-hemoglobin-stabilizing protein (ALAS2, a hemoglobin regulatory protein (Che Yaacob et al., 2020)) are the most important nodes.
On other hand, when the same approach was used in GSE31821 dataset of the patients with AF and healthy subjects, a relatively simple PPI network (containing 2 clusters with 9 nodes) was identified by STRING analysis ( Figure 5). Further, Cystoscape analysis found that Module A ( Figure 5B, the left side) and Module B ( Figure 5B, the right side)  Table 1 for all abbreviations. have 4 and 3 nodes, respectively. Among these, Acyl-CoA desaturase (SCD, an enzyme involving biosynthesis of monounsaturated fatty acids (Vanhercke et al., 2011)), Fatty acid-binding protein (FABP4, a fatty acid transport protein (Rezar et al., 2020)), and Glycerol-3-phosphate acyltransferase 1 (GPAM, an enzyme involving glycerolipids biosynthesis (Mitka et al., 2019)) were the mostly critical nodes in Module A; while Serotransferrin (TF, an iron transport protein (Jamnongkan et al., 2019)), Protein CYR61 (CYR61, a cellular growth regulatory protein (Huang et al., 2017)), and Versican core protein (VCAN, an extracellular matrix proteoglycan (Gardela et al., 2020)) are the most critical nodes in Module B.
Interestingly, when the common DEGs from two datasets were analyzed using the same approach, it was found that only three proteins, MME, Transferrin receptor protein 1 (TfR1), and Lysosome-associated membrane glycoprotein 1 (LAMP1, an integral membrane protein with unknown function (Kirschner et al., 2016)), interacted each other while the remaining 18 proteins had no significant influencing characteristics ( Figure 6). Taken together, these data suggested that although the PPI within and between two datasets are complex and most (if not all) functional interactions remain largely unknown, three proteins (MME, TfR1, LAMP1) may be concurrently involved in the development of CAD and AF.

DISCUSSION
In the present study, we investigated the common DEGs and molecular networks of two datasets consisting of the healthy subjects and the patients with CAD or AF using various bioinformatic tools. Overall, 565 up-regulated and 1367 down-regulated genes were discovered in the dataset from the patients with CAD, while 293 up-regulated and 68 down-regulated genes were revealed in the dataset from the patients with AF. From these genes, 21 common DEGs were highly enriched in the intracellular cytoplasm, protein binding, and labyrinthine layer of vessel in both CAD and AF patients. These common DEGs are involved in 4 pathways in the CAD dataset and 21 pathways in the AF dataset. Further analysis of those pathways identified three important proteins (MME, TfR1, LAMP1) that highly co-expressed in the CAD and AF patients. To the best of our knowledge, this is the first study to investigate the cross-correlation of all DEGs between the CAD and AF datasets. The findings may facilitate a better understanding of the mechanisms underlying the pathogenesis of CAD and AF.
The close relationship between CAD and AF has been well recognized in literature, including the fact such as patients with AF develop a high prevalence of CAD (Michniewicz et al., 2018). It is known that genetic factors contribute importantly to both CAD and AF and studying the genetic basis of cardiovascular disease has made significant contribution to understand disease biology and promote cardiovascular therapy (Yla-Herttuala and Baker, 2017). Numerous studies have identified a number of key genes and critical modules that are associated with CAD and AF by analyzing the microarrays data using bioinformatic tool and platform (Wang et al., 2016;Zhang et al., 2014). However, the genomic correlations between two diseases have not been fully investigated.
Our studies found that there are significant number of up-and down-regulated genes in each disease, but these genes may not be directly correlated each other within two diseases. However, among those were 21 common DEGs identified from two datasets, including 14 up-and 7 down-regulated genes that could be involved in the pathogenesis of CAD and AF. At the protein levels, three major candidates of MME, TfR1, and LAMP1 that were encoded by the corresponding genes in those 21 common DEGs, were revealed by PPI network analysis, suggesting that these proteins may play a critical role in the development of two diseases. MME is a 100 kD type II transmembrane glycoprotein and plays an important role by enzymatically modulating the metabolism of glucagon, enkephalins, substance P, neurotensin, oxytocin, bradykinin, and atrial natriuretic peptides (ANP) (Roques, 1998). Among these, ANP is a key peptide synthesized by the heart and contributes critical regulatory roles in normal cardiovascular homeostasis and cardiovascular disease (Munagala et al., 2004). It was reported that MME is upregulated in the heart of patients with heart failure and in the neutrophils of patients with early phase of acute myocardial infarction (Fielitz et al., 2002;Knecht et al., 2002). MME also controls local antifibrotic peptide bradykinin through the degradation of bradykinin in the extracellular space of heart tissue (Fielitz et al., 2002). Our results indicated that MME is one of the common genes concurrently expressed in both CAD and AF, suggesting that MME could become a therapy target for the AF patients with CAD.
TRFC gene encodes a cell surface receptor, termed transferring receptor 1 (TfR1), necessary for cellular iron uptake via the receptor-mediated endocytosis and it is essential for the function of red blood cell and development of the nervous system (Levy et al., 1999). Both iron overload and iron deficiency, which are directly controlled by transferring receptor, were found to cause cardiomyopathy and heart failure (Anand and Gupta, 2018;Kremastinos and Farmakis, 2011). The present finding of highly expressed TfR1 in both CAD and AF provides additional evidence regarding the potential role of TfR1 in the pathogenesis of cardiovascular diseases.
LAMP1 is a member of membrane glycoprotein family, and LAMP1/2 are major components of lysosomal membrane (Eskelinen, 2006). Studies demonstrated LAMP1 is involved in autophagy process via mediating fusion between autophagosome and lysosomes; but the detailed mechanism is not fully understood. It was reported that the excessive autophagy by intracellular stress devoted significantly negative impacts on the developments of various cardiovascular diseases, including CAD and heart failure (Martinet et al., 2007). Our study found a remarkable upregulation of LAP1, supporting the possible involvement of LAMP1 in both CAD and AF. Surprisingly, mice with LAMP1 deficiency manifest normal lysosomal morphology and function (Andrejewski et al., 1999). The discrepancy could be due to different species or experimental condition that need further investigation.
The BP, MF, and CC are three terms commonly used in the GO enrichment analysis to reveal the involvement of genes of interest at different biological levels (Walter et al., 2015). The present study found that the common DEGs for CAD and AF that were enriched mostly in the intracellular cytosol appear to be involved in the development of the labyrinthine layer during embryonic vessel development. Rinkenberger and Werb (2000) demonstrated that the CC is involved in the labyrinthine layer of the placenta blood vessel progression and connected with cardiovascular system development; but future investigation is needed to address the biological role of the vascular labyrinthine layer in cardiovascular abnormalities, such as CAD and AF.
Signaling pathway (or signaling cascade or biochemical cascade) is a series of cellular and molecular reactions that always take place in cells under normal and diseases conditions, including the development of CAD and AF. By running KEGG pathway analysis, Pocai (2019) found that mRNA surveillance pathway, ribosome biogenesis, and glucagon signaling pathway were the major pathways that affect CAD. In the case of glucagon pathway, Ali et al. (2015) demonstrated that glucagon administration impairs survival following ischemia in non-diabetic mouse and promote cardiomyocytes apoptosis. The present study identified 4 signaling pathways that were likely associated with CAD, supporting the above findings. By contrast, 21 pathways, such as Focal adhesion and MAPK pathways (see Table 5), were generated out of the AF dataset using the same approach. Among these, MAPK pathway is probably the most important signaling pathway related to the pathogenesis of cardiovascular disease, including AF (Zhang et al., 2003). A study found that the MAPK pathway is involved in occurrence of AF in patients with rheumatic heart disease after cardiac surgery through promoting atrial fibrosis (Zhang et al., 2017), which is consistent with our present finding on MAPK pathway.
In conclusion, the present study identified 21 common DEGs out of thousands of genes in the two datasets collected from the patients with CAD or AF. These common DEGs were highly enriched in the intracellular cytoplasm, protein binding, and vascular labyrinthine layer in patients. Three important protein candidates (MME, TfR1, and LAMP1) may play crucial roles in the disease development of both CAD and AF. We realized that the study comes with limitations. The subjects between the CAD and the AF have different ethnic background and medical history. The sample source and size should also be improved. Future studues using animal models with CAD and AF should be conducted to validate the hypothesis.